Search CORE

23 research outputs found

Using machine learning to predict pathogenicity of genomic variants throughout the human genome

Author: Rentzsch Philipp
Publication venue: Humboldt-Universität zu Berlin
Publication date: 14/04/2023
Field of study

Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität. Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores. Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt. Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity. Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants. The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency. In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores

Author: Kircher Martin
Rentzsch Philipp
Schubach Max
Shendure Jay
Publication venue
Publication date: 01/01/2021
Field of study

Background: Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods: It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results: We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions: While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction

Institutional Repository of the Freie Universität Berlin

The analysis of heterotaxy patients reveals new loss-of-function variants of GRK5

Author: Bauer Ulrike M. M.
Burkhalter Martin D.
Hitz Marc-Phillip
Kubisch Christian
Lessel Davor
Moepps Barbara
Muhammad Tariq
Philipp Melanie
Rentzsch Axel
Schalinski Adelheid
Schubert Stephan
Tena Teresa Casar
Toka Okan
Ware Stephanie M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/09/2016
Field of study

G protein-coupled receptor kinase 5 (GRK5) is a regulator of cardiac performance and a potential therapeutic target in heart failure in the adult. Additionally, we have previously classified GRK5 as a determinant of left-right asymmetry and proper heart development using zebrafish. We thus aimed to identify GRK5 variants of functional significance by analysing 187 individuals with laterality defects (heterotaxy) that were associated with a congenital heart defect (CHD). Using Sanger sequencing we identified two moderately frequent variants in GRK5 with minor allele frequencies <10%, and seven very rare polymorphisms with minor allele frequencies <1%, two of which are novel variants. Given their evolutionarily conserved position in zebrafish, in-depth functional characterisation of four variants (p.Q41L, p.G298S, p.R304C and p.T425M) was performed. We tested the effects of these variants on normal subcellular localisation and the ability to desensitise receptor signalling as well as their ability to correct the left-right asymmetry defect upon Grk5l knockdown in zebrafish. While p.Q41L, p.R304C and p.T425M responded normally in the first two aspects, neither p.Q41L nor p.R304C were capable of rescuing the lateralisation phenotype. The fourth variant, p.G298S was identified as a complete loss-of-function variant in all assays and provides insight into the functions of GRK5

IUPUIScholarWorks

PubMed Central

CADD: predicting the deleteriousness of variants throughout the human genome

Author: Adzhubei
Arciero
Bouaoun
Bowling
Casper
Chintalapati
Cooper
Cooper
Cuperus
Daniela Witten
Davydov
Ewing
Findlay
Franc
Ganel
Ghosh
Grantham
Gray
Gregory M Cooper
Groß
Herrero
Holstege
Huang
Ioannidis
Ionita-Laza
Itan
Iulio
Jagadeesh
Jay Shendure
Kichaev
Kircher
Knecht
Landrum
Lee
Lek
Li
Liu
Low
Martin Kircher
McCoy
McLaren
Ng
Ng
Oliphant
Patwardhan
Patwardhan
Pedregosa
Philipp Rentzsch
Pollard
Quang
Racimo
Ruffier
Shendure
Siepel
Smedley
Starita
Stenson
Sundaram
van der Velde
van der Velde
Wang
Watanabe
Xiong
Zhang
Zhou
Zhou
Publication venue
Publication date: 01/01/2018
Field of study

Combined Annotation-Dependent Depletion (CADD) is a widely used measure of variant deleteriousness that can effectively prioritize causal variants in genetic analyses, particularly highly penetrant contributors to severe Mendelian disorders. CADD is an integrative annotation built from more than 60 genomic features, and can score human single nucleotide variants and short insertion and deletions anywhere in the reference assembly. CADD uses a machine learning model trained on a binary distinction between simulated de novo variants and variants that have arisen and become fixed in human populations since the split between humans and chimpanzees; the former are free of selective pressure and may thus include both neutral and deleterious alleles, while the latter are overwhelmingly neutral (or, at most, weakly deleterious) by virtue of having survived millions of years of purifying selection. Here we review the latest updates to CADD, including the most recent version, 1.4, which supports the human genome build GRCh38. We also present updates to our website that include simplified variant lookup, extended documentation, an Application Program Interface and improved mechanisms for integrating CADD scores into other tools or applications. CADD scores, software and documentation are available at https://cadd.gs.washington.edu

Institutional Repository of the Freie Universität Berlin

Crossref

Binding between Crossveinless-2 and Chordin Von Willebrand Factor Type C Domains Promotes BMP Signaling by Blocking Chordin Activity

Author: A Nasevicius
AL Ambrosio
CA Conley
Carl-Philipp Heisenberg
D Ben-Zvi
D Umulis
Daria Graziussi
DS Wagner
E Bier
E Coles
E Decotto
EM De Robertis
EM De Robertis
EM De Robertis
F Rentzsch
HL Ashe
J Larrain
J Lin
J Xie
J-S Joly
Jin-Li Zhang
JL Zhang
JL Zhang
Li-Yan Qiu
Lucy J. Patterson
M Hammerschmidt
M Hammerschmidt
M Ikeya
M Kamimura
M Matsui
M Moser
M Oelgeschlager
M Oelgeschlager
M Serpe
Matthias Hammerschmidt
MB O'Connor
MC Mullins
ME Binnerts
O Muraoka
O Shimmi
R Kelley
RAW Rupp
S Dal-Pra
S Fisher
S Keller
S Piccolo
S Schulte-Merker
SC Little
SS Blair
T Fujisawa
T Kirsch
VE Miller-Bertoglio
Walter Sebald
Y Kishimoto
Y Li
Y Sasai
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

BACKGROUND: Crossveinless-2 (CV2) is an extracellular BMP modulator protein of the Chordin family, which can either enhance or inhibit BMP activity. CV2 binds to BMP2 via subdomain 1 of the first of its five N-terminal von Willebrand factor type C domains (VWC1). Previous studies showed that this BMP binding is required for the anti-, but not for the pro-BMP effect of CV2. More recently, it was shown that CV2 can also bind to the BMP inhibitor Chordin. However, it remained unclear which domains mediate this binding, and whether it accounts for an anti- or pro-BMP effect. PRINCIPAL FINDINGS: Here we report that a composite interface of CV2 consisting of subdomain 2 of VWC1 and of VWC2-4, which are dispensable for BMP binding, binds to the VWC2 domain of Chordin. Functional data obtained in zebrafish embryos indicate that this binding of Chordin is required for CV2's pro-BMP effect, which actually is an anti-Chordin effect and, at least to a large extent, independent of Tolloid-mediated Chordin degradation. We further demonstrate that CV2 mutant versions that per se are incapable of BMP binding can attenuate the Chordin/BMP interaction. CONCLUSIONS: We have physically dissected the anti- and pro-BMP effects of CV2. Its anti-BMP effect is obtained by binding to BMP via subdomain1 of the VWC1 domain, a binding that occurs in competition with Chordin. In contrast, its pro-BMP effect is achieved by direct binding to Chordin via subdomain 2 of VWC1 and VWC2-4. This binding seems to induce conformational changes within the Chordin protein that weaken Chordin's affinity to BMP. We propose that in ternary Chordin-CV2-BMP complexes, both BMP and Chordin are directly associated with CV2, whereas Chordin is pushed away from BMP, ensuring that BMPs can be more easily delivered to their receptors

Public Library of Science (PLOS)

Crossref

Kölner UniversitätsPublikationsServer

Directory of Open Access Journals

PubMed Central

p53 and TAp63 promote keratinocyte proliferation and differentiation in breeding tubercles of the zebrafish

Author: A Geling
A Rangarajan
A Vanhoutteghem
A Yang
A Yang
A Yang
AA Mills
AB Truong
AB Truong
AF Moorman
BC Nguyen
BK Padhi
Boris Fischer
C Bamberger
C Blanpain
C Blanpain
C Byrne
C Caubet
C Fukazawa
C Parng
CB Kimmel
CP Crum
D Le Guellec
Dennis Roop
E Candi
E Candi
E Candi
E Proksch
E Wienholds
EK Suh
Elmon Schmelzer
F Rentzsch
FM Watt
H Lee
H Nakamura
J Bakkers
J Guinea-Viniegra
J Laurikkala
JA Mack
JR Huh
JY Bertrand
K Laue
K Lefort
KA Holbrook
KC Madison
KE King
KE King
KE King
KM Kwan
L Alibardi
L Galluzzi
LK Petersen
M Lamkanfi
M Matsuki
M Nicolas
Manuel Metzger
Matthias Hammerschmidt
MI Koster
MI Koster
MJ Hardman
MJ Parsons
ML Williams
MM Neff
N Scheer
NC Reich
NL Wu
OD Maddocks
P Wu
Philipp Knyphausen
R Okuyama
R Okuyama
R Richardson
R Tomasini
RA Romano
Rainer Franzen
Rebecca Richardson
RTH Lee
S Berghmans
S Deasey
S Estrach
S Sidi
T Yugawa
T Yugawa
Thomas J. Carney
Thomas Ramezani
TR Roberts
UJ Pyati
V Dötsch
V Link
V Ratushny
Wilhelm Bloch
X Guo
X Su
Y Sasaki
Z Gong
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

p63 is a multi-isoform member of the p53 family of transcription factors. There is compelling genetic evidence that ΔNp63 isoforms are needed for keratinocyte proliferation and stemness in the developing vertebrate epidermis. However, the role of TAp63 isoforms is not fully understood, and TAp63 knockout mice display normal epidermal development. Here, we show that zebrafish mutants specifically lacking TAp63 isoforms, or p53, display compromised development of breeding tubercles, epidermal appendages which according to our analyses display more advanced stratification and keratinization than regular epidermis, including continuous desquamation and renewal of superficial cells by derivatives of basal keratinocytes. Defects are further enhanced in TAp63/p53 double mutants, pointing to partially redundant roles of the two related factors. Molecular analyses, treatments with chemical inhibitors and epistasis studies further reveal the existence of a linear TAp63/p53->Notch->caspase 3 pathway required both for enhanced proliferation of keratinocytes at the base of the tubercles and their subsequent differentiation in upper layers. Together, these studies identify the zebrafish breeding tubercles as specific epidermal structures sharing crucial features with the cornified mammalian epidermis. In addition, they unravel essential roles of TAp63 and p53 to promote both keratinocyte proliferation and their terminal differentiation by promoting Notch signalling and caspase 3 activity, ensuring formation and proper homeostasis of this self-renewing stratified epithelium

Crossref

Kölner UniversitätsPublikationsServer

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

MPG.PuRe

Explore Bristol Research

FigShare